Statistical classification について

Words near each other

・ Statistica
・ Statistica (journal)
・ Statistical Abstract of the United States
・ Statistical Accounts of Scotland
・ Statistical Analysis Center
・ Statistical and Applied Mathematical Sciences Institute
・ Statistical and Social Inquiry Society of Ireland
・ Statistical Applications in Genetics and Molecular Biology
・ Statistical arbitrage
・ Statistical area (United States)
・ Statistical assembly
・ Statistical Assessment Service
・ Statistical association football predictions
・ Statistical assumption
・ Statistical benchmarking
・ Statistical classification
・ Statistical Classification of Economic Activities in the European Community
・ Statistical conclusion validity
・ Statistical Consultancy Process
・ Statistical correlations of criminal behaviour
・ Statistical coupling analysis
・ Statistical data type
・ Statistical database
・ Statistical discrimination
・ Statistical discrimination (economics)
・ Statistical dispersion
・ Statistical distance
・ Statistical energy analysis
・ Statistical ensemble (mathematical physics)
・ Statistical epidemiology

Dictionary Lists

mini英和辞書

翻訳と辞書　辞書検索 [ 開発暫定版 ]

スポンサードリンク

Statistical classification ：ウィキペディア英語版

Statistical classification

In machine learning and statistics, classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known. An example would be assigning a given email into "spam" or "non-spam" classes or assigning a diagnosis to a given patient as described by observed characteristics of the patient (gender, blood pressure, presence or absence of certain symptoms, etc.).
In the terminology of machine learning, classification is considered an instance of supervised learning, i.e. learning where a training set of correctly identified observations is available. The corresponding unsupervised procedure is known as clustering, and involves grouping data into categories based on some measure of inherent similarity or distance.
Often, the individual observations are analyzed into a set of quantifiable properties, known variously as explanatory variables or ''features''. These properties may variously be categorical (e.g. "A", "B", "AB" or "O", for blood type), ordinal (e.g. "large", "medium" or "small"), integer-valued (e.g. the number of occurrences of a part word in an email) or real-valued (e.g. a measurement of blood pressure). Other classifiers work by comparing observations to previous observations by means of a similarity or distance function.
An algorithm that implements classification, especially in a concrete implementation, is known as a classifier. The term "classifier" sometimes also refers to the mathematical function, implemented by a classification algorithm, that maps input data to a category.
Terminology across fields is quite varied. In statistics, where classification is often done with logistic regression or a similar procedure, the properties of observations are termed explanatory variables (or independent variables, regressors, etc.), and the categories to be predicted are known as outcomes, which are considered to be possible values of the dependent variable. In machine learning, the observations are often known as ''instances'', the explanatory variables are termed ''features'' (grouped into a feature vector), and the possible categories to be predicted are ''classes''. There is also some argument over whether classification methods that do not involve a statistical model can be considered "statistical". Other fields may use different terminology: e.g. in community ecology, the term "classification" normally refers to cluster analysis, i.e. a type of unsupervised learning, rather than the supervised learning described in this article.
==Relation to other problems==
Classification and clustering are examples of the more general problem of pattern recognition, which is the assignment of some sort of output value to a given input value. Other examples are regression, which assigns a real-valued output to each input; sequence labeling, which assigns a class to each member of a sequence of values (for example, part of speech tagging, which assigns a part of speech to each word in an input sentence); parsing, which assigns a parse tree to an input sentence, describing the syntactic structure of the sentence; etc.
A common subclass of classification is probabilistic classification. Algorithms of this nature use statistical inference to find the best class for a given instance. Unlike other algorithms, which simply output a "best" class, probabilistic algorithms output a probability of the instance being a member of each of the possible classes. The best class is normally then selected as the one with the highest probability. However, such an algorithm has numerous advantages over non-probabilistic classifiers:
*It can output a confidence value associated with its choice (in general, a classifier that can do this is known as a ''confidence-weighted classifier'').
*Correspondingly, it can ''abstain'' when its confidence of choosing any particular output is too low.
*Because of the probabilities which are generated, probabilistic classifiers can be more effectively incorporated into larger machine-learning tasks, in a way that partially or completely avoids the problem of ''error propagation''.

抄文引用元・出典: フリー百科事典『ウィキペディア（Wikipedia）』
■ウィキペディアで「Statistical classification」の詳細全文を読む

スポンサードリンク

翻訳と辞書 : 翻訳のためのインターネットリソース